{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e8c3c7da",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/agent/Chatbot_SEC.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "ae56bcff",
   "metadata": {},
   "source": [
    "# 💬🤖 How to Build a Chatbot\n",
    "\n",
    "LlamaIndex serves as a bridge between your data and Language Learning Models (LLMs), providing a toolkit that enables you to establish a query interface around your data for a variety of tasks, such as question-answering and summarization.\n",
    "\n",
    "In this tutorial, we'll walk you through building a context-augmented chatbot using a [Data Agent](https://gpt-index.readthedocs.io/en/stable/core_modules/agent_modules/agents/root.html). This agent, powered by LLMs, is capable of intelligently executing tasks over your data. The end result is a chatbot agent equipped with a robust set of data interface tools provided by LlamaIndex to answer queries about your data.\n",
    "\n",
    "**Note**: This tutorial builds upon initial work on creating a query interface over SEC 10-K filings - [check it out here](https://medium.com/@jerryjliu98/how-unstructured-and-llamaindex-can-help-bring-the-power-of-llms-to-your-own-data-3657d063e30d).\n",
    "\n",
    "### Context\n",
    "\n",
    "In this guide, we’ll build a \"10-K Chatbot\" that uses raw UBER 10-K HTML filings from Dropbox. Users can interact with the chatbot to ask questions related to the 10-K filings."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "03f3e1de",
   "metadata": {},
   "source": [
    "### Preparation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1211059f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import openai\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"\n",
    "openai.api_key = os.environ[\"OPENAI_API_KEY\"]\n",
    "\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "CuHeyb224pI2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# set text wrapping\n",
    "from IPython.display import HTML, display\n",
    "\n",
    "\n",
    "def set_css():\n",
    "    display(\n",
    "        HTML(\n",
    "            \"\"\"\n",
    "  <style>\n",
    "    pre {\n",
    "        white-space: pre-wrap;\n",
    "    }\n",
    "  </style>\n",
    "  \"\"\"\n",
    "        )\n",
    "    )\n",
    "\n",
    "\n",
    "get_ipython().events.register(\"pre_run_cell\", set_css)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "218cc812",
   "metadata": {},
   "source": [
    "### Ingest Data\n",
    "\n",
    "Let's first download the raw 10-k files, from 2019-2022."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "YC4R6nkCp91d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2023-09-22 11:13:42--  https://www.dropbox.com/s/948jr9cfs7fgj99/UBER.zip?dl=1\n",
      "Resolving www.dropbox.com (www.dropbox.com)... 2620:100:601f:18::a27d:912, 162.125.5.18\n",
      "Connecting to www.dropbox.com (www.dropbox.com)|2620:100:601f:18::a27d:912|:443... connected.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: /s/dl/948jr9cfs7fgj99/UBER.zip [following]\n",
      "--2023-09-22 11:13:43--  https://www.dropbox.com/s/dl/948jr9cfs7fgj99/UBER.zip\n",
      "Reusing existing connection to [www.dropbox.com]:443.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: https://uc5e96fc71f5bcad342d7ef5261b.dl.dropboxusercontent.com/cd/0/get/CEMPMHdxNS2yZDvMeO8IVhjAHBo1ExUFCUxxR3rUUAuuAn2VBlNyyyzCCERRU4Uj9cVyRgHADCluk4Kqqe1NWdxiC1Uh1u85EJEPIlVuW1gK9-KC3EcD0tD7u21w14I6d80gfspvvfKJCFzc15556zTV/file?dl=1# [following]\n",
      "--2023-09-22 11:13:43--  https://uc5e96fc71f5bcad342d7ef5261b.dl.dropboxusercontent.com/cd/0/get/CEMPMHdxNS2yZDvMeO8IVhjAHBo1ExUFCUxxR3rUUAuuAn2VBlNyyyzCCERRU4Uj9cVyRgHADCluk4Kqqe1NWdxiC1Uh1u85EJEPIlVuW1gK9-KC3EcD0tD7u21w14I6d80gfspvvfKJCFzc15556zTV/file?dl=1\n",
      "Resolving uc5e96fc71f5bcad342d7ef5261b.dl.dropboxusercontent.com (uc5e96fc71f5bcad342d7ef5261b.dl.dropboxusercontent.com)... 2620:100:601f:15::a27d:90f, 162.125.5.15\n",
      "Connecting to uc5e96fc71f5bcad342d7ef5261b.dl.dropboxusercontent.com (uc5e96fc71f5bcad342d7ef5261b.dl.dropboxusercontent.com)|2620:100:601f:15::a27d:90f|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 1820227 (1,7M) [application/binary]\n",
      "Saving to: ‘data/UBER.zip’\n",
      "\n",
      "data/UBER.zip       100%[===================>]   1,74M  3,12MB/s    in 0,6s    \n",
      "\n",
      "2023-09-22 11:13:45 (3,12 MB/s) - ‘data/UBER.zip’ saved [1820227/1820227]\n",
      "\n",
      "Archive:  data/UBER.zip\n",
      "   creating: data/UBER/\n",
      "  inflating: data/UBER/UBER_2021.html  \n",
      "  inflating: data/__MACOSX/UBER/._UBER_2021.html  \n",
      "  inflating: data/UBER/UBER_2020.html  \n",
      "  inflating: data/__MACOSX/UBER/._UBER_2020.html  \n",
      "  inflating: data/UBER/UBER_2019.html  \n",
      "  inflating: data/__MACOSX/UBER/._UBER_2019.html  \n",
      "  inflating: data/UBER/UBER_2022.html  \n",
      "  inflating: data/__MACOSX/UBER/._UBER_2022.html  \n"
     ]
    }
   ],
   "source": [
    "# NOTE: the code examples assume you're operating within a Jupyter notebook.\n",
    "# download files\n",
    "!mkdir data\n",
    "!wget \"https://www.dropbox.com/s/948jr9cfs7fgj99/UBER.zip?dl=1\" -O data/UBER.zip\n",
    "!unzip data/UBER.zip -d data"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a2200f83",
   "metadata": {},
   "source": [
    "To parse the HTML files into formatted text, we use the [Unstructured](https://github.com/Unstructured-IO/unstructured) library. Thanks to [LlamaHub](https://llamahub.ai/), we can directly integrate with Unstructured, allowing conversion of any text into a Document format that LlamaIndex can ingest.\n",
    "\n",
    "First we install the necessary packages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dc997cf8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting llama-hub\n",
      "  Obtaining dependency information for llama-hub from https://files.pythonhosted.org/packages/3f/af/3bc30c2b7ca1bdd7a193f67443539f6667a6b77dd62e54f2c5c8464ad4cb/llama_hub-0.0.31-py3-none-any.whl.metadata\n",
      "  Downloading llama_hub-0.0.31-py3-none-any.whl.metadata (8.8 kB)\n",
      "Requirement already satisfied: unstructured in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (0.10.15)\n",
      "Collecting atlassian-python-api (from llama-hub)\n",
      "  Obtaining dependency information for atlassian-python-api from https://files.pythonhosted.org/packages/ca/ed/3577ccec639736c8e4660423be68cf1a4a7040bf543b3144793760792949/atlassian_python_api-3.41.2-py3-none-any.whl.metadata\n",
      "  Downloading atlassian_python_api-3.41.2-py3-none-any.whl.metadata (8.7 kB)\n",
      "Collecting html2text (from llama-hub)\n",
      "  Downloading html2text-2020.1.16-py3-none-any.whl (32 kB)\n",
      "Requirement already satisfied: llama-index>=0.6.9 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-hub) (0.8.29.post1)\n",
      "Requirement already satisfied: psutil in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-hub) (5.9.5)\n",
      "Collecting retrying (from llama-hub)\n",
      "  Downloading retrying-1.3.4-py3-none-any.whl (11 kB)\n",
      "Requirement already satisfied: chardet in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (5.2.0)\n",
      "Requirement already satisfied: filetype in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (1.2.0)\n",
      "Requirement already satisfied: python-magic in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (0.4.27)\n",
      "Requirement already satisfied: lxml in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (4.9.3)\n",
      "Requirement already satisfied: nltk in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (3.8.1)\n",
      "Requirement already satisfied: tabulate in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (0.9.0)\n",
      "Requirement already satisfied: requests in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (2.31.0)\n",
      "Requirement already satisfied: beautifulsoup4 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (4.12.2)\n",
      "Requirement already satisfied: emoji in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (2.8.0)\n",
      "Requirement already satisfied: dataclasses-json in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (0.5.14)\n",
      "Requirement already satisfied: tiktoken in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (0.5.1)\n",
      "Requirement already satisfied: langchain>=0.0.293 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (0.0.295)\n",
      "Requirement already satisfied: sqlalchemy>=2.0.15 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (2.0.21)\n",
      "Requirement already satisfied: numpy in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (1.26.0)\n",
      "Requirement already satisfied: tenacity<9.0.0,>=8.2.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (8.2.3)\n",
      "Requirement already satisfied: openai>=0.26.4 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (0.28.0)\n",
      "Requirement already satisfied: pandas in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (2.1.0)\n",
      "Requirement already satisfied: urllib3<2 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (1.26.16)\n",
      "Requirement already satisfied: fsspec>=2023.5.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (2023.9.1)\n",
      "Requirement already satisfied: typing-inspect>=0.8.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (0.9.0)\n",
      "Requirement already satisfied: typing-extensions>=4.5.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (4.8.0)\n",
      "Requirement already satisfied: nest-asyncio in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (1.5.8)\n",
      "Collecting deprecated (from atlassian-python-api->llama-hub)\n",
      "  Obtaining dependency information for deprecated from https://files.pythonhosted.org/packages/20/8d/778b7d51b981a96554f29136cd59ca7880bf58094338085bcf2a979a0e6a/Deprecated-1.2.14-py2.py3-none-any.whl.metadata\n",
      "  Downloading Deprecated-1.2.14-py2.py3-none-any.whl.metadata (5.4 kB)\n",
      "Requirement already satisfied: six in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from atlassian-python-api->llama-hub) (1.16.0)\n",
      "Requirement already satisfied: oauthlib in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from atlassian-python-api->llama-hub) (3.2.2)\n",
      "Requirement already satisfied: requests-oauthlib in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from atlassian-python-api->llama-hub) (1.3.1)\n",
      "Requirement already satisfied: soupsieve>1.2 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from beautifulsoup4->unstructured) (2.5)\n",
      "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from dataclasses-json->unstructured) (3.20.1)\n",
      "Requirement already satisfied: click in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from nltk->unstructured) (8.1.7)\n",
      "Requirement already satisfied: joblib in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from nltk->unstructured) (1.3.2)\n",
      "Requirement already satisfied: regex>=2021.8.3 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from nltk->unstructured) (2023.8.8)\n",
      "Requirement already satisfied: tqdm in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from nltk->unstructured) (4.66.1)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from requests->unstructured) (3.2.0)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from requests->unstructured) (3.4)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from requests->unstructured) (2023.7.22)\n",
      "Requirement already satisfied: PyYAML>=5.3 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (6.0.1)\n",
      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (3.8.5)\n",
      "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (4.0.3)\n",
      "Requirement already satisfied: langsmith<0.1.0,>=0.0.38 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (0.0.38)\n",
      "Requirement already satisfied: numexpr<3.0.0,>=2.8.4 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (2.8.6)\n",
      "Requirement already satisfied: pydantic<3,>=1 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (1.10.12)\n",
      "Requirement already satisfied: packaging>=17.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from marshmallow<4.0.0,>=3.18.0->dataclasses-json->unstructured) (23.1)\n",
      "Requirement already satisfied: greenlet!=0.4.17 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from sqlalchemy>=2.0.15->llama-index>=0.6.9->llama-hub) (2.0.2)\n",
      "Requirement already satisfied: mypy-extensions>=0.3.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from typing-inspect>=0.8.0->llama-index>=0.6.9->llama-hub) (1.0.0)\n",
      "Requirement already satisfied: wrapt<2,>=1.10 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from deprecated->atlassian-python-api->llama-hub) (1.15.0)\n",
      "Requirement already satisfied: python-dateutil>=2.8.2 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from pandas->llama-index>=0.6.9->llama-hub) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2020.1 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from pandas->llama-index>=0.6.9->llama-hub) (2023.3.post1)\n",
      "Requirement already satisfied: tzdata>=2022.1 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from pandas->llama-index>=0.6.9->llama-hub) (2023.3)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (23.1.0)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (6.0.4)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (1.9.2)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (1.4.0)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (1.3.1)\n",
      "Downloading llama_hub-0.0.31-py3-none-any.whl (9.8 MB)\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m9.8/9.8 MB\u001b[0m \u001b[31m16.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n",
      "\u001b[?25hDownloading atlassian_python_api-3.41.2-py3-none-any.whl (167 kB)\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m167.2/167.2 kB\u001b[0m \u001b[31m20.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hDownloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)\n",
      "Installing collected packages: retrying, html2text, deprecated, atlassian-python-api, llama-hub\n",
      "Successfully installed atlassian-python-api-3.41.2 deprecated-1.2.14 html2text-2020.1.16 llama-hub-0.0.31 retrying-1.3.4\n"
     ]
    }
   ],
   "source": [
    "!pip install llama-hub unstructured"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f55a00d7",
   "metadata": {},
   "source": [
    "Then we can use the `UnstructuredReader` to parse the HTML files into a list of `Document` objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5dcd0f94",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[nltk_data] Downloading package punkt to /home/jtorres/nltk_data...\n",
      "[nltk_data]   Package punkt is already up-to-date!\n",
      "[nltk_data] Downloading package averaged_perceptron_tagger to\n",
      "[nltk_data]     /home/jtorres/nltk_data...\n",
      "[nltk_data]   Package averaged_perceptron_tagger is already up-to-\n",
      "[nltk_data]       date!\n"
     ]
    }
   ],
   "source": [
    "from llama_hub.file.unstructured.base import UnstructuredReader\n",
    "from pathlib import Path\n",
    "\n",
    "years = [2022, 2021, 2020, 2019]\n",
    "\n",
    "loader = UnstructuredReader()\n",
    "doc_set = {}\n",
    "all_docs = []\n",
    "for year in years:\n",
    "    year_docs = loader.load_data(\n",
    "        file=Path(f\"./data/UBER/UBER_{year}.html\"), split_documents=False\n",
    "    )\n",
    "    # insert year metadata into each year\n",
    "    for d in year_docs:\n",
    "        d.metadata = {\"year\": year}\n",
    "    doc_set[year] = year_docs\n",
    "    all_docs.extend(year_docs)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "312d0cfe",
   "metadata": {},
   "source": [
    "### Setting up Vector Indices for each year\n",
    "\n",
    "We first setup a vector index for each year. Each vector index allows us\n",
    "to ask questions about the 10-K filing of a given year.\n",
    "\n",
    "We build each index and save it to disk."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7c90fafc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# initialize simple vector indices\n",
    "# NOTE: don't run this cell if the indices are already loaded!\n",
    "from llama_index import VectorStoreIndex, ServiceContext, StorageContext\n",
    "\n",
    "index_set = {}\n",
    "service_context = ServiceContext.from_defaults(chunk_size=512)\n",
    "for year in years:\n",
    "    storage_context = StorageContext.from_defaults()\n",
    "    cur_index = VectorStoreIndex.from_documents(\n",
    "        doc_set[year],\n",
    "        service_context=service_context,\n",
    "        storage_context=storage_context,\n",
    "    )\n",
    "    index_set[year] = cur_index\n",
    "    storage_context.persist(persist_dir=f\"./storage/{year}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f0704f6b",
   "metadata": {},
   "source": [
    "To load an index from disk, do the following"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7100e1b5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Load indices from disk\n",
    "from llama_index import load_index_from_storage\n",
    "\n",
    "index_set = {}\n",
    "for year in years:\n",
    "    storage_context = StorageContext.from_defaults(\n",
    "        persist_dir=f\"./storage/{year}\"\n",
    "    )\n",
    "    cur_index = load_index_from_storage(\n",
    "        storage_context, service_context=service_context\n",
    "    )\n",
    "    index_set[year] = cur_index"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0aa3f903",
   "metadata": {},
   "source": [
    "### Setting up a Sub Question Query Engine to Synthesize Answers Across 10-K Filings\n",
    "\n",
    "Since we have access to documents of 4 years, we may not only want to ask questions regarding the 10-K document of a given year, but ask questions that require analysis over all 10-K filings.\n",
    "\n",
    "To address this, we can use a [Sub Question Query Engine](https://gpt-index.readthedocs.io/en/stable/examples/query_engine/sub_question_query_engine.html). It decomposes a query into subqueries, each answered by an individual vector index, and synthesizes the results to answer the overall query.\n",
    "\n",
    "LlamaIndex provides some wrappers around indices (and query engines) so that they can be used by query engines and agents. First we define a `QueryEngineTool` for each vector index.\n",
    "Each tool has a name and a description; these are what the LLM agent sees to decide which tool to choose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ce53419f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from llama_index.tools import QueryEngineTool, ToolMetadata\n",
    "\n",
    "individual_query_engine_tools = [\n",
    "    QueryEngineTool(\n",
    "        query_engine=index_set[year].as_query_engine(),\n",
    "        metadata=ToolMetadata(\n",
    "            name=f\"vector_index_{year}\",\n",
    "            description=(\n",
    "                \"useful for when you want to answer queries about the\"\n",
    "                f\" {year} SEC 10-K for Uber\"\n",
    "            ),\n",
    "        ),\n",
    "    )\n",
    "    for year in years\n",
    "]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6e8d2177",
   "metadata": {},
   "source": [
    "Now we can create the Sub Question Query Engine, which will allow us to synthesize answers across the 10-K filings. We pass in the `individual_query_engine_tools` we defined above, as well as a `service_context` that will be used to run the subqueries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c6cee32",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from llama_index.query_engine import SubQuestionQueryEngine\n",
    "\n",
    "query_engine = SubQuestionQueryEngine.from_defaults(\n",
    "    query_engine_tools=individual_query_engine_tools,\n",
    "    service_context=service_context,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "de5362b6",
   "metadata": {},
   "source": [
    "### Setting up the Chatbot Agent\n",
    "\n",
    "We use a LlamaIndex Data Agent to setup the outer chatbot agent, which has access to a set of Tools. Specifically, we will use an OpenAIAgent, that takes advantage of OpenAI API function calling. We want to use the separate Tools we defined previously for each index (corresponding to a given year), as well as a tool for the sub question query engine we defined above.\n",
    "\n",
    "First we define a `QueryEngineTool` for the sub question query engine:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f42e5a52",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "query_engine_tool = QueryEngineTool(\n",
    "    query_engine=query_engine,\n",
    "    metadata=ToolMetadata(\n",
    "        name=\"sub_question_query_engine\",\n",
    "        description=(\n",
    "            \"useful for when you want to answer queries that require analyzing\"\n",
    "            \" multiple SEC 10-K documents for Uber\"\n",
    "        ),\n",
    "    ),\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "fdcc922d",
   "metadata": {},
   "source": [
    "Then, we combine the Tools we defined above into a single list of tools for the agent:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fad25dca",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "tools = individual_query_engine_tools + [query_engine_tool]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "14219225",
   "metadata": {},
   "source": [
    "Finally, we call `OpenAIAgent.from_tools` to create the agent, passing in the list of tools we defined above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb01833c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from llama_index.agent import OpenAIAgent\n",
    "\n",
    "agent = OpenAIAgent.from_tools(tools, verbose=True)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9e6112d4",
   "metadata": {},
   "source": [
    "### Testing the Agent\n",
    "\n",
    "We can now test the agent with various queries.\n",
    "\n",
    "If we test it with a simple \"hello\" query, the agent does not use any Tools."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "269e6700",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello Bob! How can I assist you today?\n"
     ]
    }
   ],
   "source": [
    "response = agent.chat(\"hi, i am bob\")\n",
    "print(str(response))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2fe5fb92",
   "metadata": {},
   "source": [
    "If we test it with a query regarding the 10-k of a given year, the agent will use\n",
    "the relevant vector index Tool."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb8226e6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== Calling Function ===\n",
      "Calling function: vector_index_2020 with args: {\n",
      "  \"input\": \"biggest risk factors\"\n",
      "}\n",
      "Got output: The biggest risk factors mentioned in the context are:\n",
      "1. The adverse impact of the COVID-19 pandemic and actions taken to mitigate it on the business.\n",
      "2. The potential reclassification of drivers as employees, workers, or quasi-employees instead of independent contractors.\n",
      "3. Intense competition in the mobility, delivery, and logistics industries, with low barriers to entry and well-capitalized competitors.\n",
      "4. The need to lower fares or service fees and offer driver incentives and consumer discounts to remain competitive.\n",
      "5. Significant losses incurred and the uncertainty of achieving profitability.\n",
      "6. The risk of not attracting or maintaining a critical mass of platform users.\n",
      "7. Operational, compliance, and cultural challenges related to the workplace culture and forward-leaning approach.\n",
      "8. The potential negative impact of international investments and the challenges of conducting business in foreign countries, including operational and compliance challenges, localization requirements, restrictive laws and regulations, competition from local companies, social acceptance, technological compatibility, improper business practices, legal uncertainty, difficulties in managing international operations, currency exchange rate fluctuations, and regulations governing local currencies.\n",
      "========================\n",
      "In 2020, some of the biggest risk factors for Uber were:\n",
      "\n",
      "1. The adverse impact of the COVID-19 pandemic and the measures taken to mitigate it on the business.\n",
      "2. The potential reclassification of drivers as employees, workers, or quasi-employees instead of independent contractors.\n",
      "3. Intense competition in the mobility, delivery, and logistics industries, with low barriers to entry and well-capitalized competitors.\n",
      "4. The need to lower fares or service fees and offer driver incentives and consumer discounts to remain competitive.\n",
      "5. Significant losses incurred and uncertainty about achieving profitability.\n",
      "6. The risk of not attracting or maintaining a critical mass of platform users.\n",
      "7. Operational, compliance, and cultural challenges related to the workplace culture and forward-leaning approach.\n",
      "8. The potential negative impact of international investments and the challenges of conducting business in foreign countries, including operational and compliance challenges, localization requirements, restrictive laws and regulations, competition from local companies, social acceptance, technological compatibility, improper business practices, legal uncertainty, difficulties in managing international operations, currency exchange rate fluctuations, and regulations governing local currencies.\n",
      "\n",
      "These risk factors highlight the challenges and uncertainties faced by Uber in 2020.\n"
     ]
    }
   ],
   "source": [
    "response = agent.chat(\n",
    "    \"What were some of the biggest risk factors in 2020 for Uber?\"\n",
    ")\n",
    "print(str(response))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "78ac181f",
   "metadata": {},
   "source": [
    "Finally, if we test it with a query to compare/contrast risk factors across years, the agent will use the Sub Question Query Engine Tool."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72e475bf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== Calling Function ===\n",
      "Calling function: sub_question_query_engine with args: {\n",
      "  \"input\": \"Compare/contrast the risk factors described in the Uber 10-K across years\"\n",
      "}\n",
      "Generated 4 sub questions.\n",
      "\u001b[36;1m\u001b[1;3m[vector_index_2022] Q: What are the risk factors described in the 2022 SEC 10-K for Uber?\n",
      "\u001b[0m\u001b[33;1m\u001b[1;3m[vector_index_2021] Q: What are the risk factors described in the 2021 SEC 10-K for Uber?\n",
      "\u001b[0m\u001b[38;5;200m\u001b[1;3m[vector_index_2020] Q: What are the risk factors described in the 2020 SEC 10-K for Uber?\n",
      "\u001b[0m\u001b[32;1m\u001b[1;3m[vector_index_2019] Q: What are the risk factors described in the 2019 SEC 10-K for Uber?\n",
      "\u001b[0m\u001b[36;1m\u001b[1;3m[vector_index_2022] A: The risk factors described in the 2022 SEC 10-K for Uber include the potential adverse effect on their business if drivers were classified as employees instead of independent contractors, the highly competitive nature of the mobility, delivery, and logistics industries, the need to lower fares or service fees to remain competitive in certain markets, the company's history of significant losses and the expectation of increased operating expenses in the future, and the potential impact on their platform if they are unable to attract or maintain a critical mass of drivers, consumers, merchants, shippers, and carriers.\n",
      "\u001b[0m\u001b[32;1m\u001b[1;3m[vector_index_2019] A: The risk factors described in the 2019 SEC 10-K for Uber include the loss of their license to operate in London, the complexity of their business and operating model due to regulatory uncertainties, the potential for additional regulations for their other products in the Other Bets segment, the evolving laws and regulations regarding the development and deployment of autonomous vehicles, and the increasing number of data protection and privacy laws around the world. Additionally, there are legal proceedings, litigation, claims, and government investigations that Uber is involved in, including those related to the classification of drivers and compliance with applicable laws, which could impose a significant burden on the company.\n",
      "\u001b[0m\u001b[33;1m\u001b[1;3m[vector_index_2021] A: The risk factors described in the 2021 SEC 10-K for Uber include the adverse impact of the COVID-19 pandemic and actions taken to mitigate it on their business, the potential reclassification of drivers as employees instead of independent contractors, intense competition in the mobility, delivery, and logistics industries, the need to lower fares and offer incentives to remain competitive, significant losses incurred and the expectation of increased operating expenses, the importance of attracting and maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers, and the uncertainty surrounding the impact of COVID-19 on their business and financial position. Additionally, the classification of drivers is being challenged in courts and by government agencies, which could have legal and financial implications for the company.\n",
      "\u001b[0m\u001b[38;5;200m\u001b[1;3m[vector_index_2020] A: The risk factors described in the 2020 SEC 10-K for Uber include the adverse impact of the COVID-19 pandemic on their business, the potential reclassification of drivers as employees instead of independent contractors, intense competition in the mobility, delivery, and logistics industries, the need to lower fares and offer incentives to remain competitive, significant losses and the uncertainty of achieving profitability, the importance of attracting and maintaining a critical mass of platform users, operational and compliance challenges, inquiries and investigations from government agencies, risks related to data security breaches, the need to introduce new or upgraded products and features, and the need to invest in the development of new offerings to retain and attract users.\n",
      "\u001b[0mGot output: The risk factors described in the Uber 10-K reports across the years include the potential reclassification of drivers as employees instead of independent contractors, intense competition in the mobility, delivery, and logistics industries, the need to lower fares and offer incentives to remain competitive, significant losses incurred and the expectation of increased operating expenses, the importance of attracting and maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers, and the impact of the COVID-19 pandemic on their business. Additionally, there are legal and regulatory uncertainties, such as the evolving laws and regulations regarding autonomous vehicles, data protection and privacy laws, and the potential for additional regulations for their other products. The reports also mention the operational and compliance challenges, inquiries and investigations from government agencies, and the risks associated with data security breaches. It is worth noting that specific risk factors may vary from year to year based on the prevailing circumstances and developments in the industry and regulatory environment.\n",
      "========================\n",
      "Here are the key points comparing and contrasting the risk factors described in the Uber 10-K reports across years:\n",
      "\n",
      "2022:\n",
      "- Potential reclassification of drivers as employees instead of independent contractors.\n",
      "- Intense competition in the mobility, delivery, and logistics industries.\n",
      "- Need to lower fares and offer incentives to remain competitive.\n",
      "- Significant losses incurred and expectation of increased operating expenses.\n",
      "- Importance of attracting and maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers.\n",
      "- Impact of the COVID-19 pandemic on their business.\n",
      "- Legal and regulatory uncertainties, including evolving laws and regulations regarding autonomous vehicles and data protection and privacy laws.\n",
      "- Operational and compliance challenges.\n",
      "- Inquiries and investigations from government agencies.\n",
      "- Risks associated with data security breaches.\n",
      "\n",
      "2021:\n",
      "- Similar risk factors as in 2022, including potential reclassification of drivers, intense competition, need to lower fares, significant losses, and the impact of the COVID-19 pandemic.\n",
      "- Emphasis on the importance of maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers.\n",
      "- Mention of legal and regulatory uncertainties, such as evolving laws and regulations regarding autonomous vehicles and data protection and privacy laws.\n",
      "- Operational and compliance challenges.\n",
      "- Inquiries and investigations from government agencies.\n",
      "- Risks associated with data security breaches.\n",
      "\n",
      "2020:\n",
      "- Similar risk factors as in 2021, including potential reclassification of drivers, intense competition, need to lower fares, significant losses, and the impact of the COVID-19 pandemic.\n",
      "- Emphasis on the importance of maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers.\n",
      "- Mention of legal and regulatory uncertainties, such as evolving laws and regulations regarding autonomous vehicles and data protection and privacy laws.\n",
      "- Operational and compliance challenges.\n",
      "- Inquiries and investigations from government agencies.\n",
      "- Risks associated with data security breaches.\n",
      "\n",
      "2019:\n",
      "- Similar risk factors as in 2020, including potential reclassification of drivers, intense competition, need to lower fares, significant losses, and the impact of the COVID-19 pandemic.\n",
      "- Emphasis on the importance of maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers.\n",
      "- Mention of legal and regulatory uncertainties, such as evolving laws and regulations regarding autonomous vehicles and data protection and privacy laws.\n",
      "- Operational and compliance challenges.\n",
      "- Inquiries and investigations from government agencies.\n",
      "- Risks associated with data security breaches.\n",
      "\n",
      "Please note that these are just the key points, and there may be additional risk factors mentioned in each year's 10-K report.\n"
     ]
    }
   ],
   "source": [
    "cross_query_str = (\n",
    "    \"Compare/contrast the risk factors described in the Uber 10-K across\"\n",
    "    \" years. Give answer in bullet points.\"\n",
    ")\n",
    "\n",
    "response = agent.chat(cross_query_str)\n",
    "print(str(response))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1153ee23",
   "metadata": {},
   "source": [
    "### Setting up the Chatbot Loop\n",
    "\n",
    "Now that we have the chatbot setup, it only takes a few more steps to setup a basic interactive loop to chat with our SEC-augmented chatbot!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5fa14fa6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <style>\n",
       "    pre {\n",
       "        white-space: pre-wrap;\n",
       "    }\n",
       "  </style>\n",
       "  "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Agent: In 2022, Uber is facing several legal proceedings. Here are some of them:\n",
      "\n",
      "1. California: The state Attorney General and city attorneys filed a complaint against Uber and Lyft, alleging that drivers are misclassified as independent contractors. A preliminary injunction was issued but stayed pending appeal. The Court of Appeal affirmed the lower court's ruling, and Uber filed a petition for review with the California Supreme Court. However, the Supreme Court declined the petition for review. The lawsuit is ongoing, focusing on claims by the California Attorney General for periods prior to the enactment of Proposition 22.\n",
      "\n",
      "2. Massachusetts: The Attorney General of Massachusetts filed a complaint against Uber, alleging that drivers are employees entitled to wage and labor law protections. Uber's motion to dismiss the complaint was denied, and a summary judgment motion is pending.\n",
      "\n",
      "3. New York: Uber is facing allegations of misclassification and employment violations by the state Attorney General. The resolution of this matter is uncertain.\n",
      "\n",
      "4. Switzerland: Several administrative bodies in Switzerland have issued rulings classifying Uber drivers as employees for social security or labor purposes. Uber is challenging these rulings before the Social Security and Administrative Tribunals.\n",
      "\n",
      "These are some of the legal proceedings against Uber in 2022. The outcomes and potential losses in these cases are uncertain.\n"
     ]
    }
   ],
   "source": [
    "agent = OpenAIAgent.from_tools(tools)  # verbose=False by default\n",
    "\n",
    "while True:\n",
    "    text_input = input(\"User: \")\n",
    "    if text_input == \"exit\":\n",
    "        break\n",
    "    response = agent.chat(text_input)\n",
    "    print(f\"Agent: {response}\")\n",
    "\n",
    "# User: What were some of the legal proceedings against Uber in 2022?"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "llama",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "0a8d7bc06ed646b78956101c5c62fb1f": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "FloatProgressModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_112ef61e952444369d6c0dce1d1098a8",
      "max": 456318,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_afe741aff3f54ae6a89b5965888599f4",
      "value": 456318
     }
    },
    "0cccb700fc264051abc0b7458716054f": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "0cd2bf060b4c4daeaf9156705ae409ab": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HBoxModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_3c4ce2ec8ae54042864d1d0261355166",
       "IPY_MODEL_2e27352e29354001b53bc4d07bd4c1ed",
       "IPY_MODEL_709f853d0ae846e7945c74ff83cae52e"
      ],
      "layout": "IPY_MODEL_54a3bafa2af54475abe57e37e2399a45"
     }
    },
    "0dc1dea99b774b74a84c9c02b6335b94": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_3b4a5130401a44daa116828b9fa8d17b",
      "placeholder": "​",
      "style": "IPY_MODEL_21efd9c072db42bf89ad5dbd96b52a01",
      "value": "Downloading (…)lve/main/config.json: 100%"
     }
    },
    "10420e2dba3244f48c8fcd7d35904e41": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "112ef61e952444369d6c0dce1d1098a8": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "1a259e1a45bc4043b81243d033b79752": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HBoxModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_f0963a05a29d4e6f8a2ab48fc8a26a1d",
       "IPY_MODEL_0a8d7bc06ed646b78956101c5c62fb1f",
       "IPY_MODEL_ef339a504f984cc2ac07c8252aa2f6d5"
      ],
      "layout": "IPY_MODEL_3d3c2d54cdf44a06a99c35d0e1b33b0a"
     }
    },
    "21efd9c072db42bf89ad5dbd96b52a01": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "2d7895836c814084be531103d0218650": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "ProgressStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": ""
     }
    },
    "2e27352e29354001b53bc4d07bd4c1ed": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "FloatProgressModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_792a299872704c10af8061175db96031",
      "max": 1355256,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_2d7895836c814084be531103d0218650",
      "value": 1355256
     }
    },
    "37cf19bcc68d41d7bc6b411cd273a88c": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_8e1f2102db184538a35ed9c8498d5bbc",
      "placeholder": "​",
      "style": "IPY_MODEL_0cccb700fc264051abc0b7458716054f",
      "value": " 1.04M/1.04M [00:00&lt;00:00, 3.35MB/s]"
     }
    },
    "387d0ca6bf2c4341804cfa43a71b6451": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "3b4a5130401a44daa116828b9fa8d17b": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "3c4ce2ec8ae54042864d1d0261355166": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_94343417f60a4f9d9a745dfa2c967893",
      "placeholder": "​",
      "style": "IPY_MODEL_f6331afe201d49319b76e528e01a9b4b",
      "value": "Downloading (…)/main/tokenizer.json: 100%"
     }
    },
    "3d3c2d54cdf44a06a99c35d0e1b33b0a": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "3e45628539864cd2b1297c2be1d35c98": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "4539f47e9e4c444e9f610dea60db5eb0": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "548c125bc716475ba6708fa80f5b1b61": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "FloatProgressModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_387d0ca6bf2c4341804cfa43a71b6451",
      "max": 1042301,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_b1c406aebffa49f48c3457a9482a47a3",
      "value": 1042301
     }
    },
    "54a3bafa2af54475abe57e37e2399a45": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "5edb09d55e7b45c881bbf6a38de92f60": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "FloatProgressModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_d75f60b7f0b94b3d88046ac990446ad4",
      "max": 665,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_609d29b907424f049de6ef0b86a6541d",
      "value": 665
     }
    },
    "609d29b907424f049de6ef0b86a6541d": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "ProgressStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": ""
     }
    },
    "6dc9a8ca9270413893e28e3a3807dd51": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "709f853d0ae846e7945c74ff83cae52e": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_10420e2dba3244f48c8fcd7d35904e41",
      "placeholder": "​",
      "style": "IPY_MODEL_d7b1b50d3f6843be90e00d3352139b2f",
      "value": " 1.36M/1.36M [00:00&lt;00:00, 3.65MB/s]"
     }
    },
    "755797e4682642399c367962ba737504": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_eb486eb0dc2f44eabcac46a18b711ff1",
      "placeholder": "​",
      "style": "IPY_MODEL_3e45628539864cd2b1297c2be1d35c98",
      "value": "Downloading (…)olve/main/vocab.json: 100%"
     }
    },
    "792a299872704c10af8061175db96031": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "7bd8740ee12a486fbb3d3ae0a2c493b8": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_6dc9a8ca9270413893e28e3a3807dd51",
      "placeholder": "​",
      "style": "IPY_MODEL_4539f47e9e4c444e9f610dea60db5eb0",
      "value": " 665/665 [00:00&lt;00:00, 18.8kB/s]"
     }
    },
    "7e96275c58434a04a49e56598b12656a": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "8e1f2102db184538a35ed9c8498d5bbc": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "94343417f60a4f9d9a745dfa2c967893": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "9bf6e3e259cd4bc0bd4c714c725d0782": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "a292f2af029c4340813e63370bf73f12": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HBoxModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_755797e4682642399c367962ba737504",
       "IPY_MODEL_548c125bc716475ba6708fa80f5b1b61",
       "IPY_MODEL_37cf19bcc68d41d7bc6b411cd273a88c"
      ],
      "layout": "IPY_MODEL_ebc4716ea8d849c58c43c26f5caf18fe"
     }
    },
    "a8bae11fb1aa40348e611a1592e0ff0a": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "afe741aff3f54ae6a89b5965888599f4": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "ProgressStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": ""
     }
    },
    "b1c406aebffa49f48c3457a9482a47a3": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "ProgressStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": ""
     }
    },
    "b2653a474569425ebb546f93241a465e": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "bbdfb7b736f546fea9f2a56a815a8e8f": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HBoxModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_0dc1dea99b774b74a84c9c02b6335b94",
       "IPY_MODEL_5edb09d55e7b45c881bbf6a38de92f60",
       "IPY_MODEL_7bd8740ee12a486fbb3d3ae0a2c493b8"
      ],
      "layout": "IPY_MODEL_fa9eff66d6084f0a970408b620afa686"
     }
    },
    "d75f60b7f0b94b3d88046ac990446ad4": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "d7b1b50d3f6843be90e00d3352139b2f": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "eb486eb0dc2f44eabcac46a18b711ff1": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "ebc4716ea8d849c58c43c26f5caf18fe": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "ef339a504f984cc2ac07c8252aa2f6d5": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_a8bae11fb1aa40348e611a1592e0ff0a",
      "placeholder": "​",
      "style": "IPY_MODEL_9bf6e3e259cd4bc0bd4c714c725d0782",
      "value": " 456k/456k [00:00&lt;00:00, 1.84MB/s]"
     }
    },
    "f0963a05a29d4e6f8a2ab48fc8a26a1d": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_7e96275c58434a04a49e56598b12656a",
      "placeholder": "​",
      "style": "IPY_MODEL_b2653a474569425ebb546f93241a465e",
      "value": "Downloading (…)olve/main/merges.txt: 100%"
     }
    },
    "f6331afe201d49319b76e528e01a9b4b": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "fa9eff66d6084f0a970408b620afa686": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    }
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
